Dataset: Human Phenotype Ontology Gold Standard(HPO-GS)

Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports

NLP Tasks: Text Classification, Information Extraction, Question Answering

Method: Evaluating the diagnostic performance of three large language models (LLMs), including a custom-built LLM (GPT-4 integrated with the Human Phenotype Ontology [GPT-4 HPO])

Metrics:

  • Diagnostic accuracy (GPT-4: 13.1%, GPT-4 HPO: 8.2%, Gemini Pro: 8.2%)

Fine-tuning large language models for rare disease concept normalization

NLP Tasks: Named Entity Recognition, Information Extraction

Method: fine-tuning Llama 2, an open-source large language model (LLM)

Metrics:

  • Accuracy (over 99%)
  • Accuracy (NAME: 10.2%, NAME+SYN: 36.1% with typos, NAME+SYN: 61.8% with typo-specific fine-tuning)
  • Accuracy (NAME: 11.2%, NAME+SYN: 92.7% for unseen synonyms)

Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT

NLP Tasks: Information Extraction, Text Classification, Text Generation

Method: PhenoBCBERT and PhenoGPT models

Metrics:

  • Accuracy